JavaScript code is parsed and compiled into bytecode through a multi-stage process: lexical analysis breaks source code into tokens, syntactic analysis builds an Abstract Syntax Tree (AST), and the interpreter generates platform-independent bytecode for execution.
Before any JavaScript code can execute, it must be transformed from human-readable source text into a form the engine can understand and run. Modern engines like V8 use a multi-stage pipeline: the scanner performs lexical analysis to create tokens, the parser performs syntactic analysis to build an Abstract Syntax Tree (AST), and the interpreter (Ignition in V8) generates bytecode from the AST. This bytecode is then executed directly by the interpreter and may later be optimized to machine code by the JIT compiler.
The scanner reads the source code character by character and groups them into meaningful tokens—the smallest units of the language like keywords, identifiers, operators, and literals .
This process is typically implemented as a deterministic finite automaton (DFA) that transitions states based on input characters .
For example, the code const x = 42; becomes tokens: const (keyword), x (identifier), = (operator), 42 (numeric literal), ; (punctuation) .
Modern engines use hand-written scanners optimized for speed rather than generated lexers .
The parser consumes the token stream and builds an Abstract Syntax Tree (AST)—a hierarchical representation of the code's grammatical structure .
Each node in the AST represents a language construct: FunctionDeclaration, BlockStatement, BinaryExpression, etc .
V8 uses a hand-written recursive descent parser that can parse JavaScript at speeds exceeding 1 MB/s .
The parser also performs early syntax checking—if there's a syntax error, it's detected here and reported before any code runs .
Two parsing modes exist: eager parsing (full AST built) and lazy parsing (skip function bodies until needed) to improve startup time .
In V8, the AST is passed to the Ignition interpreter, which generates platform-independent bytecode .
Bytecode is a compact, low-level representation that is about 25-50% the size of equivalent baseline machine code, reducing memory usage .
Each bytecode instruction is a single byte followed by optional arguments, making it efficient to decode and execute .
The bytecode generator traverses the AST and emits instructions for each node: LdaSmi [42] (load small integer), Add (add top two stack values), Return (return from function) .
Ignition uses a register-based architecture (not stack-based) for better performance and lower instruction count .
The bytecode also includes slots for feedback vectors—metadata that will collect type information during execution for later optimization .
Most functions in a typical web page are never called during the initial page load. Parsing and compiling them eagerly wastes time and memory .
V8 uses lazy parsing: function bodies are not fully parsed or compiled until they are first called .
When the parser encounters a function declaration, it skips the body, storing only enough information to locate it later (pre-parsing) .
When the function is called for the first time, it is fully parsed and compiled (eager parsing + bytecode generation) .
This technique reduces initial parse time by up to 40% and memory usage significantly .
The trade-off is a small delay on first function call, which is usually acceptable .
Recent V8 versions added streaming compilation for script tags with async or defer attributes .
As the script downloads, parsing and compilation begin incrementally on a background thread .
By the time the download completes, the script may already be fully parsed and compiled, eliminating startup overhead .
This technique reduced Speedometer 2.0 parse/compile time by up to 23% .
The entire parsing and bytecode generation pipeline is heavily optimized for real-world web workloads. The combination of lazy parsing, streaming compilation, and efficient bytecode design means that even large JavaScript applications can start up quickly. The bytecode itself serves as both the input for execution (via Ignition) and the source of profiling data for TurboFan, creating a seamless pipeline from source code to optimized machine code.